Task Services
The task engine is a service for running batch jobs in the background. It performs a variety of functions primarily covering:
- Publications and Printing: The task engine handles jobs that are submitted for publication through the scheduling engine (through Publish) or user-initiated Printing, the task engine performs all the relevant processing and rendering.
- Data Preparation and ETL: The task engine handles all tasks and jobs that are submitted for data processing.
- The task engine also handles other background functions like cleaning operations, authentication synchronization, and a variety of other system-level operations.
Peak vs Off-Peak
Data source performance is heavily affected by concurrency and the number of requests processed at any given time. Background tasks can heavily impact performance for end users if it competes for those resources.
To ensure optimal performance for users during normal business hours, it is best to set the time period per day that represents peak hours. Using this time split, it is possible to ensure that fewer jobs are being executed by the task engine during peak times, giving end-users better access to system resources when they are actively using the system.
Enter peak hours for each day in the Peak Hours panel at the top of the page.
Task Services Setup
Settings for the entire platform can be determined in the mid panel or per task engine in the cluster (if running in multi-server mode).
Scaling Mode
Select how the cluster allocates resources to each task engine.
Task Engines Management Mode
Where the Task Engines Management Mode drop-down is present, services are running natively on Windows or Linux. In this case, you need to select one of the scaling modes as follows:
- Manual: The administrator manually sets which task servers will perform what activities and the number of threads to run concurrently during peak / off peak times.
- By Percentage: The administrator assigns a percentage "coverage" that each task activity should have across all the task servers in the cluster and lets the engine automatically assign resources accordingly.
Typically, if the cluster is built manually, with additional nodes added on an irregular basis, the manual model is preferred - because admins are given control for how to allocate task activities. If the cluster shrinks and grows automatically, then the manual approach is infeasible and the automatic approach (by percentage) must be used instead.
Kubernetes Scaling Mode
Where you are using a Kubernetes deployment, there is no Task Engines Management Mode drop-down. In this case, the scaling setup operates by spawning satellite pods automatically in response to additional ETLs being run. The settings that are specific to Kubernetes are:
- Data Processing Pod Memory (GB): The amount of memory (RAM) available to the pod by default. This is the memory assigned for the pod. You may want to increase it if your ETL is heavy or fails with an out of memory error.
- Data Processing Pod CPU: The amount of CPU the pod receives by default.
Settings
- Peak threads / Off peak threads: The number of tasks that can be run by the task engines during the hours set, on specific days of the week. Generally, the peak threads should be low enough to eliminate resource competition with users, while the off-peak threads can be increased to ensure maximum usage of resources when users are not on the system (both off hours and off days).
- Task Type: In a multi-server configuration, it's possible to designate which task engine will run which type of task Print / Publish vs Data processing / ETL, or both. Being able to designate task type by server provides administrators with control over resource allocation, performance settings (at the hardware level), and system throughput.